Expansion algorithm for the density matrix 



Anders M. N. Niklasson 
Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA 

(February 1, 2008) 



(N ■ 

o : 

o , 

O ■ 



C/3 



X3 
O 

o 



> 

O 

vn 
a\ 
o 

(N 
O 



I 

O 

o 



X 



A purification algorithm for expanding the single-particle 
density matrix in terms of the Hamiltonian operator is pro- 
posed. The scheme works with a predefined occupation and 
requires less than half the number of matrix-matrix multipli- 
cations compared to existing methods at low (< 10%) and 
high (> 90%) occupancy. The expansion can be used with 
a fixed chemical potential in which case it is an asymmetric 
generalization of and a substantial improvement over grand 
canonical McWeeny purification. It is shown that the com- 
putational complexity, measured as number of matrix multi- 
plications, essentially is independent of system size even for 
metallic materials with a vanishing band gap. 



I. INTRODUCTION 

Theoretical predictions of material properties of com- 
plex systems consisting of millions of atoms are often 
limited not by theory but by the calculational techniques. 
Recently there has been a large effort to develop ab-initio 
methods that computationally scale linearly with system 
size [1]. The techniques may play an important role in a 
broad spectrum of science such as molecular biology, ma- 
terials science, chemistry and nanotechnology. Several of 
the linear scaling schemes are based on the single-particle 
density matrix that can be used in order to calculate the 
energies and densities that occur in self-consistent field 
theories. The construction of the density matrix is used 
as an alternative to solving an eigenvalue problem. For 
large complex systems within a sparse matrix representa- 
tion this approach can be performed more efficiently and 
instead of a cubic scaling the computational cost scales 
linearly with the system size [1]. The matrix sparsity is 
essential for the success of density matrix schemes. For 
materials with a gap the real space representation of the 
density matrix is sparse [2-4] due to a finite interaction 
length, which is usually referred to as nearsightedness [5]. 
However, within other representations, such as a multi- 
resolution wavelet basis or a group-renormalization rep- 
resentation, the density matrix is sparse also for metallic 
systems [6-9]. 

Most techniques for constructing the density matrix 
can be seen as a polynomial expansion of the density 
matrix po in terms of the Hamiltonian operator H. In an 
iterative approach this expansion can be formulated as 



Pq = lim„^oo Xn- 



n = 1,2,. 



(1) 



The projection polynomials Pn{H, Xn) are chosen to 
achieve a rapid convergence under the conditions of com- 
mutation, [H,po] = 0, idempotency, p§ = po, and par- 
ticle conservation, Tr[po] = A'e. They may either be 
chosen from a constrained conjugate gradient minimiza- 
tion of the energy functional Tr[pH] [5,10-20], or as a 
fast expansion of the step function 0{fil — H) = pQ cen- 
tered at the chemical potential p, or, for finite tempera- 
tures, the Fermi-Dirac distribution [9,19-25]. Each com- 
putational step consists of matrix-matrix additions, sub- 
tractions and multiplications. The problem is to find a 
rapidly convergent expansion that minimizes the number 
of matrix-matrix multiplications, since these operations 
are the most time consuming [26,27]. 

The efficiency of the different density-matrix schemes 
varies depending on particular characteristics of the prob- 
lem such as the existence of a band gap, a predefined 
chemical potential, filling factor, self-consistency cycles, 
thresholding, basis set and system size. In this letter we 
propose a purification algorithm for the construction of 
the density matrix that is simple, general and rapidly 
convergent also for very large metallic systems. The 
method works with a predefined occupation and does 
not need the input or adjustment of the chemical po- 
tential. Only one previous purification strategy, recently 
developed by Falser and Manolopoulos (FM) [24], exists 
for this important problem. By using a starting guess 
Xq with the trace equal to the occupation number and 
thereafter performing trace-conserving spectral projec- 
tions, Xn converges to the correct density matrix with- 
out prior knowledge of the chemical potential. The PM 
scheme has an excellent performance compared to other 
methods [24,20]. However, due to the constraint of trace 
conservation the method is inefficient at low and high 
partial occupancy. This is of great concern, for example, 
when using a multi-resolution wavelet representation for 
metallic problems, since the fractional filling in this case 
is low. The same problem occurs with a minimal basis 
set at both high and low occupancies. A simple general 
algorithm that avoids this particular problem, and still 
converges as or more rapidly, especially for very large 
problems, would therefore be of great interest. 



II. TRACE CORRECTING PURIFICATION 

The method we propose is based on the continuously 
increasing purification polynomials with stationary end 
points in [0, 1] 
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pi") (x) = 1 - (1 - x)™ [1 + mx] 
P^\x)=x"' [l+m{l-x)]. 



(2) 



Two examples of Pm\x) and Pm\x), for m = 1 and 3, 
arc displayed in Fig. 1. It can be shown that any com- 
bination of these polynomials in an iterative expansion 
converges to a step function for x G [0, 1], i.e. 



(3) 



with the step ^ e 1 — Pm]- Here (3m is the inflection 
point of Pt\x), i.e. where P^\Pm) = /3rrx, < /3„ < 1, 
and (1 — Pm) is the inflection point of Pm{x). The con- 
vergence towards a step function can be understood from 
the fact that for each new iteration the new function will 
still be continuously increasing, but with an in creasing 
number of vanishing derivatives at the end points. The 
asymmetry in the number of vanishing derivatives de- 
termines the position of the step. The choice m = 2 
corresponds to the McWeeny polynomial [21]. In this 
symmetric case (32 = 1/2, and a step can only be formed 
at 5 = 0.5. The occupation of an operator X can be 
modified such that 



f Tr[P^\x)] > Tr[X]; e(X) G [/3„,1] 
[ Tr[P^\x)] < Tr[X]; e{X) G [0, 1 - /3„ 



(4) 



where s{X) arc the eigenvalues of X. For m = 1 the 
reverse situation holds, with switched inequalities. With 
m 7^ 2 we can apply the polynomials of Eq. (2) in the 
expansion of the density matrix, Eq. (1), such that each 
step adjusts for the occupation of X„. In this way an 
expansion is created that converges to the density matrix 
with the correct occupation, i.e. po = ^if^I ^ H) with 
Tr\po\ = Ne, but without a priori knowledge of fx. The 
algorithm (for m > 2) is given by this pseudocode: 



function po{H, N^., ErrorLimit) 
estimate eo{H), sn{H) 

Xo - (1 - 2/3„)(eAr/ - H)/{SN - £o) + Pml 

while Error > ErrorLimit 
if Tr[Xn] -Ne<0 

Xn+1 = Pm\Xn) 

else 



(5) 



Pm ^ (^n ) 



end 

estimate Error 



Po = Xn . 

For rn = 1 the trace condition has to be reversed to " > " . 

The scheme can be described as follows: First the 
Hamiltonian is normalized to an initial matrix Xq with 
all its eigenvalues e(Xo) G [(3m, 1 — /3m]- The constants 
eo and en are the lowest and highest eigenvalues of H, 



respectively. These can be approximated by, for exam- 
ple, Gersgorin estimates or the Lanczos method, with 
only a small extra computational cost [20,24]. A neces- 
sary criterion for convergence is that the unknown chem- 
ical potential of the normalized initial matrix /x(Xo) G 
[/3to,1 — Prn\- For intermediate occupancy, provided 
/i(Xo) G [(3m, 1 — /3m], /3m Can be set to zero in the start- 
ing guess. This usually reduces the number of iterations 
by one or two steps. The improvement has not been used 
in the present study. After initializing the projections 
Pm\Xn) or Pm {Xn) are performed, adjusting the occu- 
pancy and expanding a step function at the same time. 
The iteration stops when some appropriate error estimate 
is less than a predefined error limit. Note, that a high or- 
der expansion may lead to too fast convergence, making 
the adjustment of the occupation impossible. We may 
also use combinations with different values of m as well 
as other asymmetric purification polynomials. Any set of 
asymmetric continuously increasing polynomials in [0, 1] 
with stationary points at and 1 can be used equiva- 
lently. The presented algorithm cannot handle problems 
with degenerate eigenstates at /i. The algorithm would 
still converge, but to the wrong density matrix, since 
the degenerate states would split due to numerical noise. 
This differs the presented trace correcting purification 
scheme from the PM scheme, which correctly can treat 
the case of degeneracy. 



III. GRAND CANONICAL PURIFICATION 

Since any combination of the expansion polynomials in 
Eq. (2) converges to a step function we can use a prede- 
fined fixed expansion combination of 9{^il — H). In this 
case /X must be known, but the efficiency might be slightly 
improved compared to the schemes working with a prede- 
fined occupancy. For example, we may use the repetitions 
of the combination P^^ix) = P2''\Pi''\Pt\^))) or (^r 



m > 1) only Pm \x), or only Pm'(x), in the combination 



P^m\x) /i>l/2 
P^^\x) fL < 1/2, 



(6) 



where p. is the normalized chemical potential, fl = {fj, — 
£o){£n — eo)~^- We can now, as in Eq. (1), perform 
the expansion using the fixed repeated combination of 



Pm^ (x) and Pm^ (x) with the starting guess 

Xa = a{iil -H) + (31, 

a = min{/3[e7v - (1 - /3)[a* - ^o]"^} 



(7) 



The constant (3 is here determined by the inflection 
point of the repeated fixed polynomial combination, e.g. 
(3m or (1 — prn) for Pm'"{x,i2). The approach can 
be seen as an asymmetric generalization of the grand 
canonical McWeeny purification scheme [21,24,25]. With 
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P^'~^{x, p) they are equivalent. The method is directly re- 
lated to matrix sign function expansions [23] . The matrix 
sign function expansion is equivalent to the purification 
scheme via a trivial linear transform where the step func- 
tion expansion is performed between —1 and in the 
interval [—1, 1]. 

If the chemical potential is unknown the density ma- 
trix may have to be recalculated with different values of 
H until the occupation is correct. However, since the den- 
sity matrix can be described as a superposition of outer 
products of the occupied eigenstates we can adjust the 
occupation by adding or subtracting Hamiltonian eigen- 
states close to the chemical potential. A few of these 
states can be calculated efficiently, for example, using 
inverse power iterations [28]. In this way the occupa- 
tion can be adjusted without a complete recalculation of 
the density matrix with a new shifted chemical poten- 
tial. Moreover, in the case of a material with a gap we 
do not need to have a very precise prior knowledge of fi as 
long as the estimate is somewhere in the gap. The grand 
canonical approach, with a predefined fixed fi, may thus 
be an efficient alternative to trace correcting purification 
in some special cases. 

IV. EXAMPLES 

To illustrate the efficiency of the expansion techniques 
we have constructed an N x N Hamiltonian test matrix 
H{i, j) with randomized off-diagonal elements decaying 
as |«— j'l^^, and with a uniform distribution of eigenvalues 
in [0, 1]. Only the eigenvalue distribution of the Hamil- 
tonian is of importance for the convergence. With H we 
have that the occupation factor A = N^/N = p, = fi and 
it is easy to compare grand canonical schemes with the 
trace correcting or trace conserving methods. 

Figure 2 shows the number of matrix multiplications 
necessary to achieve an error ||X„ — polb < 10^^ as a 
function of the filling factor A or chemical potential fx. 
The PM trace conserving purification scheme is slow at 
low and high occupancy (since the slope at the inflec- 
tion point tends to 1 as the inflection point approaches 
or 1), whereas the new trace correcting expansion algo- 
rithm, with TO = 1 (Pi), TO = 3 (P3), and to = 5 (P5), 
has an overall fast convergence. For example, at 10% oc- 
cupancy the new scheme with to = 3 is about twice as 
fast compared to the PM scheme. There are essentially 
three reasons for the improved convergence: i) a more 
optimized ratio between the number of matrix multipli- 
cations and the polynomial order, ii) a, faster increase of 
the number of vanishing derivatives at the stationary end 
points, and in), as will be shown below, a steeper slope at 
the inflection points, partly due to the asymmetry of the 
puriflcation polynomials. The generalized grand canoni- 
cal expansion with to = 3 (Pa'^) converges several steps 



faster compared to the grand canonical McWeeny (McW) 
method for the same reasons. 

V. SCALING 

By varying A'^, i.e. the size of the Hamiltonian test 
matrix, we may see how the number of matrix multi- 
plications necessary for convergence scales with system 
size, or equivalcntly with the inverse gap at the chemical 
potential Ae^ = 1/N. The behavior is crucial for very 
large systems, especially if we wish to construct an ex- 
pansion scheme that computationally scales linearly with 
the system size for metallic materials with a vanishing 
band gap. Figure 3 displays the number of necessary 
matrix multiplications as a function of In(A^). In the up- 
per graph the results of McWeeny purification (McW) 
and the trace conserving canonical puriflcation (PM) are 
on top of each other. For this particular symmetrical 
case the two schemes are identical [24]. The graph indi- 
cates a stepwise linear relationship between the number 
of matrix multiplications M necessary for convergence 
and the logarithm of the system size. The relation can 
be approximated by the linear formula 

M{fi,N) = a{p) + K{p)lii{N). (8) 

The least square fits of M{fj,, N) are shown together with 
the values of k. The expansions Pi, P5 and P3"-' perform 
equally well or slightly worse compared to P3, and are 
not shown. The slopes determine the efficiency for very 
large systems and we find the best scaling for P3. 

The convergence is determined by the slowest converg- 
ing eigenvalue 7(^0), closest above or below the chemical 
potential of the normalized initial matrix ij,{Xq). This 
particular eigenvalue should cither converge to 1 or 0. 
Since the purification polynomials are continuously in- 
creasing, preserving the order of the eigenvalues, all other 
eigenvalues converge faster. In the case of grand canoni- 
cal purification, with a uniform distribution of N eigen- 
values, 7(A'', Xq) w /3 ± 1/{2N). By means of a lineariza- 
tion of the purification polynomial around (3 it can be 
shown that 7(iV, Xq) k, ^[kN, Xi), where k is the deriva- 
tive of the purification polynomial at the inflection point, 
e.g. k = P^^'(/3m,^). Thus, increasing the number of 
states by AA^ = A^(fc — 1) and performing one extra iter- 
ation leads to the same error in 7. If is the number of 
multiplications necessary to achieve a fixed error of 7 and 
p is the number of matrix multiplication in one iteration, 
we have that 

[M^{N + AN) - M^{N)]/{AN) « p[N{k - (9) 
Let My be a continuous version of Mj such that 

dM^/dN = p [N{k - 1)]"^ . (10) 
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Integration gives 

M^{N)=a+-^HN), (11) 

for some constant a. This approximate formula explains 
the linear relation in Eq. (8) and is a useful measure in 
optimizations of grand canonical expansions. Purifica- 
tion polynomials should be optimized on the criterion of 
matrix multiplications v.s. vanishing derivatives at the 
stationary fixed points, and the slope Kcst = p{k — ^)~^ 
as estimated in Eq. (11). For example, the purification 
P3*^*^(X) requires two matrix multiplications in each it- 
eration, it has three vanishing fixed point derivatives, and 
Kest = 3.1. This should be compared to McWeeny pu- 
rification that also requires two matrix multiplications in 
each iteration, but with only two vanishing fixed point 
derivatives, and with a Kest = 4.0. 

For materials with a band gap, the value 1/A'' should 
be replaced by the gap at the chemical potential Ae^. 
In this case the scaling with the logarithm of the sys- 
tem size vanishes and the number of necessary matrix 
multiplication for a predefined convergence accuracy is 
constant. 

Notice that one may use different criteria for conver- 
gence such as the error per state, per atom, or the to- 
tal error. However, this has only a minor effect on the 
number of necessary matrix multiplications because of 

the very rapid rate of convergence close to idempotency, 
which, for example, is quadratic in the WcWeeny case. 

VI. FIRST PRINCIPLES PERFORMANCE 

To further illustrate the performance of the expansion 
scheme we show the result of an implementation in the 
MondoSCF suite of linear scaling self-consistent field pro- 
grams [17,29-31]. Figure 4 displays the mmiber of nec- 
essary matrix multiplications for clusters and strings of 
Li atoms and for different molecules. In the case of Li 
clusters and strings of Li atoms the systems arc metallic 
in the sense that the gap vanishes in the limit of infinite 
number of atoms. This is thus a good test to check the 
linear relationship between the number of matrix multi- 
plications and the logarithm of the inverse band gap, i.e. 
the logarithm of the system size for metallic materials. 
The Pi scheme is efficient compared to the PM scheme, 
especially in the case of SiF4 where the occupancy A is 
high. This particular example illustrates the inefficiency 
of the PM scheme at high and low occupancy which is 
avoided with the trace correcting algorithm. The weak 
logarithmic dependence between computational cost and 
system size for metallic systems, illustrated by the dashed 
lines in the figure, is also confirmed. Notice that an ac- 
tual linear scaling is reached only if the number of non 
zero elements of the density matrix grows linearly with 
system size. This can generally not be achieved within a 



real-space representation for metallic systems. Instead, 
as mentioned above, a multi-resolution wavelet basis or a 
group-renormalization approach has to be applied. How- 
ever, the number of matrix multiplication necessary for 
convergence should not be affected by a change of rep- 
resentation since the vanishing gap around the chemical 
potential will remain the same regardless of the represen- 
tation, given, in the case of a wavelets, via a biorthogonal 
transformation of the Hamiltonian [8,9]. 

The Pi scheme implemented in the MondoSCF pro- 
grams has two major advantages compared to other 
schemes: i) It requires less memory compared to higher 
order schemes since only second order polynomials are 
used and intermediate matrix products does not have to 
be stored, ii) It is less complex and only matrix squares 
has to be calculated. A specially designed algorithm for 
matrix squares can possibly be made more efficient than 
a general matrix product algorithm. 

VII. DISCUSSION 

The expansion scheme and the convergence analysis 
illustrated and argued for here provide a basis for the 

understanding of purification algorithms and their effi- 
ciency, and it shows that the computational complexity, 
as measured in number of matrix multiplications, essen- 
tially is independent of system size even for metallic sys- 
tems. This is in contrast to, for example, some conjugate 
gradient schemes, where in the worst case, the maximal 
number of iterations was shown to scale as N'^, with c 
varying between 1/3 for insulators and 1 for metals [32]. 
However, if the additional problem of thresholding is in- 
cluded, which can be performed either via a finite cut-off 
radius truncation, or via a numerical threshold, the com- 
putational complexity as a function of system size, within 
some required numerical accuracy, becomes far more dif- 
ficult to analyze and practical experience may be the only 
way to understand the efficiency. 

In the alternative construction of the density matrix 
using a constrained functional minimization, as devised 
by Li et al. [11], the McWeeny purification is used to im- 
pose idempotency. The asymmetric polynomial expan- 
sions proposed here may serve as a possible alternative. 

VIII. SUMMARY 

In summary we have proposed an algorithm for ex- 
panding the single-particle density matrix in terms of 
the Hamiltonian that is simple, general, and with a com- 
putational complexity essentially independent of system 
size even for very large metallic systems with a vanishing 
band gap. If the expansion is used together with a fixed 
chemical potential it was shown to be an asymmetric gen- 
eralization of the grand canonical McWeeny purification. 
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The algorithm is a substantial improvement of previous 
schemes and provides together with the presented con- 
vergence analysis a framework for the understanding and 
optimization of purification. 
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FIG. 2. 

The number of matrix-matrix multiplications M nec- 
essary to achieve a convergence \\Xn — Polb < 10~^, as a 
function of the filling factor A, or equivalently, the chem- 
ical potential /z, for H with N = 100. PM corresponds 
to the rcsiilt using the canonical trace conserving pu- 
rification scheme by Falser and Manolopoulos [24]. The 
open symbols show the result of the expansion algo- 
rithm, Eq. (5). The small squares indicate the result of 
the grand canonical WcWeeny purification (McW) and 
P^'^ix,!!) inEq. (6). 



FIG. 3. 

The number of matrix-matrix multiplications M nec- 
essary to achieve a convergence \En — Eo\/N < 10~^, 
as a function of ln(A'') or equivalently ln(l/Ae^). Here 
En = Tr[HXn], N is the total number of states and As/^ 
is the gap at the chemical potential. 



FIG. 4. 

The number of matrix multiplications M (after three 

self-consistency cycles with ST0-3G or ST0-6G basis 
sets) necessary to achieve a convergence \En — i?n-i| < 
10"'' a.u., as a function of the logarithm of the number of 
atoms. Occupation A € [0.3, 0.7] except for NiF4. For the 
Li strings and clusters, as indicated by the dashed lines, 
the gap is vanishing, i.e. metallic, in the limit "Number 
of atoms" — > 00. 
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